english translation
ELR-1000: A Community-Generated Dataset for Endangered Indic Indigenous Languages
Joshi, Neha, Gogoi, Pamir, Mirza, Aasim, Jansari, Aayush, Yadavalli, Aditya, Pandey, Ayushi, Shukla, Arunima, Sudharsan, Deepthi, Bali, Kalika, Seshadri, Vivek
We present a culturally-grounded multimodal dataset of 1,060 traditional recipes crowdsourced from rural communities across remote regions of Eastern India, spanning 10 endangered languages. These recipes, rich in linguistic and cultural nuance, were collected using a mobile interface designed for contributors with low digital literacy. Endangered Language Recipes (ELR)-1000 -- captures not only culinary practices but also the socio-cultural context embedded in indigenous food traditions. We evaluate the performance of several state-of-the-art large language models (LLMs) on translating these recipes into English and find the following: despite the models' capabilities, they struggle with low-resource, culturally-specific language. However, we observe that providing targeted context -- including background information about the languages, translation examples, and guidelines for cultural preservation -- leads to significant improvements in translation quality. Our results underscore the need for benchmarks that cater to underrepresented languages and domains to advance equitable and culturally-aware language technologies. As part of this work, we release the ELR-1000 dataset to the NLP community, hoping it motivates the development of language technologies for endangered languages.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Indonesia > Bali (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning
Dou, Shaoyu, Shen, Yutian, Chen, Mofan, Wang, Zixuan, Xu, Jiajie, Guo, Qi, Shao, Kailai, Chen, Chao, Hu, Haixiang, Shi, Haibo, Min, Min, Zhang, Liwen
Large Language Models (LLMs) demonstrate significant potential but face challenges in complex financial reasoning tasks requiring both domain knowledge and sophisticated reasoning. Current evaluation benchmarks often fall short by not decoupling these capabilities indicators from single task performance and lack root cause analysis for task failure. To address this, we introduce FinEval-KR, a novel evaluation framework for decoupling and quantifying LLMs' knowledge and reasoning abilities independently, proposing distinct knowledge score and reasoning score metrics. Inspired by cognitive science, we further propose a cognitive score based on Bloom's taxonomy to analyze capabilities in reasoning tasks across different cognitive levels. We also release a new open-source Chinese financial reasoning dataset covering 22 subfields to support reproducible research and further advancements in financial reasoning. Our experimental results reveal that LLM reasoning ability and higher-order cognitive ability are the core factors influencing reasoning accuracy. We also specifically find that even top models still face a bottleneck with knowledge application. Furthermore, our analysis shows that specialized financial LLMs generally lag behind the top general large models across multiple metrics.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- (2 more...)
- Banking & Finance > Trading (1.00)
- Information Technology > Security & Privacy (0.68)
- Government (0.68)
Aligning Language Models with Real-time Knowledge Editing
Tang, Chenming, Yang, Yutong, Wang, Kexue, Wu, Yunfang
Knowledge editing aims to modify outdated knowledge in large language models (LLMs) efficiently while retaining their original capabilities. Mainstream benchmarks for knowledge editing are predominantly static and fail to keep in pace with the evolving real-world knowledge. In this work, we introduce CRAFT, an ever-evolving real-world benchmark for knowledge editing. It features well-designed paired edits for composite reasoning, and evaluates models on alias portability as well as temporal and common-sense locality, making it a challenging knowledge editing benchmark on which previous knowledge editing methods hardly achieve balanced performance. Towards flexible real-time editing, we propose KEDAS, a novel paradigm of knowledge editing alignment featuring diverse edit augmentation and self-adaptive post-alignment inference, which exhibits significant performance gain on CRAFT compared to previous methods. All of our code and data are available at https://anonymous.4open.science/r/CRAFT-KEDAS.
- North America > United States (0.14)
- Asia > China (0.05)
Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
Naous, Tarek, Savit, Anagha, Catalan, Carlos Rafael, Guo, Geyang, Lee, Jaehyeok, Lee, Kyungdon, Dizon, Lheane Marie, Ye, Mengyu, Kothari, Neel, Singh, Sahajpreet, Masud, Sarah, Patwa, Tanish, Tran, Trung Thanh, Khan, Zohaib, Ritter, Alan, Bak, JinYeong, Sakaguchi, Keisuke, Chakraborty, Tanmoy, Arase, Yuki, Xu, Wei
As Large Language Models (LLMs) gain stronger multilingual capabilities, their ability to handle culturally diverse entities becomes crucial. Prior work has shown that LLMs often favor Western-associated entities in Arabic, raising concerns about cultural fairness. Due to the lack of multilingual benchmarks, it remains unclear if such biases also manifest in different non-Western languages. In this paper, we introduce Camellia, a benchmark for measuring entity-centric cultural biases in nine Asian languages spanning six distinct Asian cultures. Camellia includes 19,530 entities manually annotated for association with the specific Asian or Western culture, as well as 2,173 naturally occurring masked contexts for entities derived from social media posts. Using Camellia, we evaluate cultural biases in four recent multilingual LLM families across various tasks such as cultural context adaptation, sentiment association, and entity extractive QA. Our analyses show a struggle by LLMs at cultural adaptation in all Asian languages, with performance differing across models developed in regions with varying access to culturally-relevant data. We further observe that different LLM families hold their distinct biases, differing in how they associate cultures with particular sentiments. Lastly, we find that LLMs struggle with context understanding in Asian languages, creating performance gaps between cultures in entity extraction. Large Language Models (LLMs) have rapidly integrated into modern technology, serving users from diverse cultures (Adilazuarda et al., 2024). Among the vast range of text they process, LLMs frequently encounter entities such as people's names, locations, or food dishes, which are pervasive in text corpora (Wolfe & Caliskan, 2021; Pawar et al., 2025a) and often appear in user prompts (Li et al., 2024a; Wang et al., 2025). Importantly, entities carry cultural associations, making it essential for LLMs to handle culturally diverse entities fairly.
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
Shen, Sherrie, Wang, Weixuan, Birch, Alexandra
The faithful transfer of contextually-embedded meaning continues to challenge contemporary machine translation (MT), particularly in the rendering of culture-bound terms--expressions or concepts rooted in specific languages or cultures, resisting direct linguistic transfer. Existing computational approaches to explicitating these terms have focused exclusively on in-text solutions, overlooking paratextual apparatus in the footnotes and endnotes employed by professional translators. In this paper, we formalize Genette's (1987) theory of paratexts from literary and translation studies to introduce the task of paratextual explicitation for MT. We construct a dataset of 560 expert-aligned paratexts from four English translations of the classical Chinese short story collection Liaozhai and evaluate LLMs with and without reasoning traces on choice and content of explicitation. Experiments across intrinsic prompting and agentic retrieval methods establish the difficulty of this task, with human evaluation showing that LLM-generated paratexts improve audience comprehension, though remain considerably less effective than translator-authored ones. Beyond model performance, statistical analysis reveals that even professional translators vary widely in their use of paratexts, suggesting that cultural mediation is inherently open-ended rather than prescriptive. Our findings demonstrate the potential of paratextual explicitation in advancing MT beyond linguistic equivalence, with promising extensions to monolingual explanation and personalized adaptation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (20 more...)
UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
Lenc, Ladislav, Cífka, Daniel, Martínek, Jiří, Šmíd, Jakub, Král, Pavel
This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
fact check AI at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-checked Claim Retrieval
SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval is approached as a Learning-to-Rank task using a bi-encoder model fine-tuned from a pre-trained transformer optimized for sentence similarity. Training used both the source languages and their English translations for multilingual retrieval and only English translations for cross-lingual retrieval. Using lightweight models with fewer than 500M parameters and training on Kaggle T4 GPUs, the method achieved 92% Success@10 in multilingual and 80% Success@10 in 5th in crosslingual and 10th in multilingual tracks.
Measuring Information Distortion in Hierarchical Ultra long Novel Reconstruction:The Optimal Expansion Ratio
A two stage novel generation framework (outline -> section outline -> manuscript) is widely used in long novel generation,(e.g., \textsc{DOME}, \textsc{Plan\&Write}, \textsc{Long Writer}), but study of such framework in ultra long novel(>1M words) reconstruction is little. Building on recent text compression methods (\textsc{LLMZip}, \textsc{LLM2Vec}), we conduct an information-theoretic analysis to quantify semantic distortion under different compression-expansion ratios. We examine how outline length affects information preservation. Experiments on ultra-long novels show that the optimal compression-expansion ratio significantly reduces semantic distortion compared to other non-optimal compression-expansion ratio.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Fabbri, Alexander R., Mares, Diego, Flores, Jorge, Mankikar, Meher, Hernandez, Ernesto, Lee, Dean, Liu, Bing, Xing, Chen
Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoning problems with context in English language/cultures. In this work, we introduce the Multilingual Native Reasoning Challenge ( MultiNRC), a benchmark designed to assess LLMs on more than 1,000 native, linguistic and culturally grounded reasoning questions written by native speakers in French, Spanish, and Chinese. MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance. For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English. This set of English equivalents can provide a direct comparison of LLM reasoning capacity in other languages vs. English on the same reasoning questions. We systematically evaluate current 14 leading LLMs covering most LLM families on MultiNRC and its English equivalent set. The results show that (1) current LLMs are still not good at native multilingual reasoning, with none scoring above 50% on MultiNRC; (2) LLMs exhibit distinct strengths and weaknesses in handling linguistic, cultural, and logical reasoning tasks; (3) Most models perform substantially better in math reasoning in English compared to in original languages (+10%), indicating persistent challenges with culturally grounded knowledge.
- South America > Argentina (0.04)
- South America > Falkland Islands (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
Design of intelligent proofreading system for English translation based on CNN and BERT
Liu, Feijun, Wang, Huifeng, Wang, Kun, Wang, Yizhen
Since automatic translations can contain errors that require substantial human post-editing, machine translation proofreading is essential for improving quality. This paper proposes a novel hybrid approach for robust proofreading that combines convolutional neural networks (CNN) with Bidirectional Encoder Representations from Transformers (BERT). In order to extract semantic information from phrases and expressions, CNN uses a variety of convolution kernel filters to capture local n-gram patterns. In the meanwhile, BERT creates context-rich representations of whole sequences by utilizing stacked bidirectional transformer encoders. Using BERT's attention processes, the integrated error detection component relates tokens to spot translation irregularities including word order problems and omissions. The correction module then uses parallel English-German alignment and GRU decoder models in conjunction with translation memory to propose logical modifications that maintain original meaning. A unified end-to-end training process optimized for post-editing performance is applied to the whole pipeline. The multi-domain collection of WMT and the conversational dialogues of Open-Subtitles are two of the English-German parallel corpora used to train the model. Multiple loss functions supervise detection and correction capabilities. Experiments attain a 90% accuracy, 89.37% F1, and 16.24% MSE, exceeding recent proofreading techniques by over 10% overall. Comparative benchmarking demonstrates state-of-the-art performance in identifying and coherently rectifying mistranslations and omissions.
- Asia > Singapore (0.04)
- Asia > China > Shanxi Province (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)